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Abstract 

In this paper we propose a family of robust estimates for isotonic regression: isotonic M-estimators. 
We show that their asymptotic distribution is, up to an scalar factor, the same as that of Brunk 's classi- 
cal isotonic estimator. We also derive the influence function and the breakdown point of these estimates. 
Finally we perform a Monte Carlo study that shows that the proposed family includes estimators that 
are simultaneously highly efficient under gaussian errors and highly robust when the error distribution 
has heavy tails. 
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1 Introduction 

Let xi, . . . ,Xn be independent random variables collected along observation points ti < . . . < tn according 
to the model 

Xj ^ fi{tj) + Uj , (1) 

where the Mj 's are i.i.d. symmetric random variables with distribution G. In isotonic regression the trend 
term nit) is monotone non-decreasing, i.e., ^(ii) < ... < fJ-itn), but it is otherwise arbitrary. In this set- 
up, the classical estimator of fi{t) is the function g which minimizes the L2 distance between the vector of 
observed and fitted responses, i.e, it minimizes. 



git,)? (2) 



in the class Q of non-decreasing piecewise continuous functions. It is trivial but noteworthy that Equation 
([2|) posits a finite dimensional convex constrained optimization problem. Its solution was first proposed by 
Brunk (1958) and has received extensive attention in the Statistical literature (see e.g., Robertson, Wright 
and Dyskra (1988) for a comprehensive account). It is also worth noting that any piecewise continuous 
non-decreasing function which agrees with the optimizer of ^ at the tj 's will be a solution. For that reason, 
in order to achieve uniqueness, it is traditional to restrict further the class Go to the subset of piecewise 
constant non-decreasing functions. Another valid choice consists in the interpolation at the knots with non- 
decreasing cubic splines or any other piecewise continuous monotone function, e.g., Meyer (1996). We will 
call this estimator the isotonic estimator. 
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The sensitivity of this estimator to extreme observations (outhers) was noted by Wang and Huang (2002), 
who propose minimizing instead using the Li norm, i.e. , minimizing 



n 



\^j-9itj)\ ■ 



This estimator wiU be caU here Li Isotonic estimator. Wang and Huang (2002) developed the asymptotic 
distribution of the trend estimator at a given observation point and obtained the asymptotic relative 
efficiency of this estimator compared with the classical L2estimator. Interestingly, this efficiency turned out 
to be 2/tt = 0.637, the same as in the i.i.d. location problem. 

In this paper we will propose instead a robust isotonic M-estimator aimed at balancing robustness with 
efficiency. Specifically we shall seek the minimizer of 



where ct„ is a an estimator of the error scale previously obtained and p satisfies the following properties 

Al (i) p{x) is non-decreasing in (ii) p(0) = 0, (iii) p is even, (iv) p{x) is strictly increasing for x > and 
(v) p has two continuous derivatives and ijj — p' is bounded and monotone non-decreasing. 

Clearly, the L2 choice corresponds to taking p{x) = x^ while the Li option is akin to opting for p{x) — \x\. 
These two estimators do no require the scale estimator (t„. 

Note that the class of M-estimators satisfying Al does not include estimators with a redescending choice 
for -0. We believe that the strict differentiability conditions on p required in Al are not strictly necessary, 
but they make the proofs for the asymptotic theory simpler. Moreover, some functions p which are not twice 
differentiable everywhere such as \x\ or the Hubers' functions defined below in ([71) can be approximated by 
functions satisfying Al. 

The asymptotic distribution of the L2 isotonic estimators at a given point was found by Brunk (1970) and 
Wright (1981) and the one of the Li estimator by Wang (2002). They prove that the distribution of these 
estimators conveniently normalized converge to the distribution of the slope at zero of the greatest convex 
minorant of the two-sided Brownian Motion with parabolic drift. In this paper, we prove a similar result 
for isotonic M-estimators. The focus of this paper is on estimation of the trend term at a single observation 
point Iq. We do not address the issue of distribution of the whole stochastic process {/i„(t),t G T}. Recent 
research along those lines are given by Kulikova and Lopuhaa (2006) and a related result with smoothing 
was also obtained simultaneously in Pal and Woodroofe (2006). 

This article is structured as follows. In Section [2] we propose the robust isotonic M-estimator. In Section 
|3] we obtain the limiting distribution of the isotonic M-estimator when the error scale is known. In Section 
|4]we prove that under general conditions the M-estimators with estimated scale have the same asymptotic 
distribution than when the scale is known. In Section [S] we define an influence function which measures the 
sensitivity of the isotonic M-estimator to an infinitesimal amount of pointwise contamination. In Section |6] 
we calculate the breakdown point of the isotonic M-estimators. In Section |8] we compare by Monte Carlo 
simulations the finite sample variances of the estimators for two error distributions: normal and Student 
with three degrees of freedom. In Section [7] we analyze two real dataset using The L2 and the isotonic 
M-estimators. Section [S] is an Appendix containing the proofs. 




(3) 
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2 Isotonic M- Estimators 



In similarity with the classical setup, we consider isotonic M-estimators that minimize the objective function 
^ within the class Go of piecewise constant non-decreasing functions. As in the L2 and Li cases, the isotonic 
M-estimator is a step function with knots at (some of) the tj's. In Robertson and Waltman (1968) it is 
shown that maximum-likelihood-type estimation under isotonic restrictions can be calculated via min-max 
formulae. Assume first that we know that the scale parameter (e.g. , the MAD, of the Uts) is ctq. Since we 
are considering M-estimators with ip non-decreasing (see Al), they can be view as the maximum likelihood 
estimators corresponding to errors with density 

exp{-^ ^{v/ao)dv) 
9[u) = — 



[exp (-^ /p" 'ip{v/ao)dv) du] 

Then we can compute the isotonic M-estimator at a point t using the min-max calculation formulae 

fln{t) = maxmin/t„(u, w) = minmax/i„(M, w), (4) 

where /t„(u,w) is the unrestricted M-estimator which minimizes 

^ P{^^], (5) 



where C{u,v) = {j ■ ^ < j l£ n;u < tj < v}. Alternatively, if p is convex and differentiable, as we are 
assuming, the terms /i„(u,w) in (|4]) can be represented uniquely as a zero of 

5„(«,«,/i)= J2 V'f^^V (6) 



In particular, when p{u) — — log{g{u)) + \og{g{0)) , where g is a probability density, the isotonic M-estimator 
coincides with the maximum likelihood estimator when is u is assumed to have density g. In particular if g 
is the N(0,cr§) density, the MLE is the M-estimator which defined by p{u) = and therefore it coincides 
with the classical L2 estimator. When g is the density of a double exponential distribution, the MLE is 
the M-estimator defined by p{u) = and therefore it coincides with the Li isotonic estimator. In these 
two cases the estimators are independent of the value of ctq. One popular family of ■0 functions to define 
M-estimators is the Huber family 

^l'^(u) = sign(u) min(|u|. A:). (7) 

Clearly, when ao is replaced by (t„, equations (jH)-® still holds with ctq replaced by (?„. Since ip is non- 
decreasing, the function Sn{u,v,fj,) defined in equation ([6]) is non- increasing as a function of p,. This entails 
the fundamental identities given below 

Sniu, v,a) > if and only if /t„(u, v) > a, (8) 
Sn{u, v,a) < if and only if pn{u, v) < a. (9) 

These identities will be very useful in the development of the asymptotic distribution. 
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3 Asymptotic Distribution 



In this section we derive the asymptotic distribution of the isotonic M-estimator /i„(io) of /i(io)- We first 
make the sample size n exphcit in the formulation of the model by postulating 

XnA = ^J■itn,'i.) + Un,t, (10) 

where the errors {un,i, 1 < z < n} form a triangular array of i.i.d. random variables with distribution G and 
{tn.i, 1 < i < n.} is a triangular array of observation points. Their exact location is described by the function 
Hn{t) — X]r=i — The values tnj may be fixed or random but we will assume that there exists 

a continuous distribution function H which has as support a finite closed interval such that 

sup|i/„(t)-H(t)| =op(n-i/3). (11) 
t 

Without loss of generality we shall assume in the sequel it is the interval [0, 1]. 

We will study the asymptotic distribution of p-nito) where to is an interior point of [0, 1]. The classical L2 
isotonic estimator /i„(to), with to at the boundary of the support of H, is known to suffer from the so-called 
spiking problem (e.g., Sun and Woodroofe, 1999), i.e., /in(io) is not even consistent. We further make the 
following assumptions. 

A2 The function H is continuously differentiablc in a neighborhood of to with ft.(io) = H'{to) > 0. 

A3 For a fixed to, we assume the function /i(t) has two continuous derivatives in a neighborhood of to, and 
M'(to) > 0. 

A4 The error distribution G has a density g symmetric and continuous with g(0) > 0. 

We consider first the case where ctq is known. Our first aim is to show that isotonic M-estimation is 
asymptotically a local problem. Specifically, we will see in Lemma [1] that /i„(io) depends only on those Xj 
corresponding to observation points tj lying in a neighborhood of order n^^^ about to- This result is similar 
to Prakasa Rao (1969), Lemma 4.1, who stated it in the context of density estimation. Our treatment here 
will parallel that of Wright (1981), who worked on the asymptotics of the L2 isotonic regression estimator 
when the smoothness of the underlying trend function /i(-) is specified via the number of its continuous 
derivatives. 

Specifically, since H'{to) > we may choose for an arbitrary c and n sufficiently large, positive numbers 
ai{n) and au{n) for which 

Hito) - H{to - ai{n)) = H{to + a„(n)) - i7(io) = 2cn-'/\ 

With this, define the localized version of the isotonic M-estimator as 

Mri(*o) = max min p,n{u,v). (12) 

to — ai{n)<u<to ta<v<to+auin) 

Then we have the following Lemma 

Lemma 1 Assume AI-A4 and ill]) . Then if fin{to) is defined by we have, 

lim limsupP[/i„(to) 7^ Mr*(*o)] = 0- (13) 

c— >oo n— J-oo 
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Is is also noteworthy that the estimator in Equation (jl2p is not computable, for ai and au depend on the 
distribution H which is generally unknown. For computational purposes this implies that the calculation 
of these estimators will indeed be global for fixed sample sized. Lemma [1] is, however, crucial to study the 
asymptotic properties of /i„(i). 

Given an stochastic process {Z{v), —oo < v < oo}, we denote by "slogcm[Z(t)]" the random variable 
that corresponds to the slope at zero of the greatest convex minorant of Z{t). The following theorem gives 
the asymptotic distribution of finito)- 



Theorem 1 Assume AI-A4 and ill]) . Let fin{to) be given by then 

(An(io) - Kto)) ^ slogcm {W{v) + v^) , (14) 



\/if m'(t W Eo(nu/^)) 

M \ta)tl 1*0)^0 I 



where W(w) is a two-sided standard Brownian motion. 

Remark 1 Notice that in the case of the L2 isotonic estimator the function p{x) = , so ipi^) = 2a; and 
ip'{x) = 2 so that Eg(iP'{u)) = 2 and cr^ = cr|„ = Aa^. Then the standardizing constant is given by 

as it is known for the L2 isotonic estimator. 

Remark 2 In the case of Li isotonic regression notice that in the function p{x) = \x\, so ip^x) =sign{x) 
for X ^ or else is left undefined. Our method is thus not applicable as the assumptions on tp do not hold. 
However, consider a sequence of functions ipm{x) for which 

{ — 1 X < —1/m 
mx — 1/to + l/m^ < a; < 1/m — l/m^ , (15) 
1 x>l/m 

and so that there is continuity of the first 3 derivatives everywhere; for a construction of such type of functions 
it is enough to consider quartic splines (e.g., De Boor, 2001). In this setup we get 

hm EG{i^l{u)) - 1 
lim EG^'r,Au)= lim m [G(l/m - 1/m^) - G(-1/to + l/m^)] = 2G"(0). 

Letting m — > oo and n ^ oo so that m/n — > oo, we obtain 

»^oo 2 (£'g(CH)) 8 [G'{0)Y 

as it is known in the case of Li isotonic regression (see Wang and Huang, 2002). 

Remark 3 A similar construction to Equation \15]) may be applied to the functions ipjf in the Huber's 
family. 
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4 Robust Isotonic M- Estimators with a Previous Scale Estimator 



We will consider now the more realistic case where ctq is not known and it is replaced by an estimator ct„ 
previously calculated. Then, in order to obtain an scale equivariant estimator we should replace ctq in ® 
and (jni) by a robust scale equivariant estimator (t„. In Remarks |4] and [5] below we give some possible choices 
for dn- 

In the next Theorem it is shown that under suitable regularity conditions, it can be proved that if an 
converges to ctq fast enough, both isotonic M-estimators, the one using the fixed scale (Tq and the one using 
the scale tT„, have the same asymptotic distribution. Making explicit the scale in the notation, denote the 
isotonic M-cstimator of based on a fixed scale a by /i„(t,(T). Then 

/t„(i,(T) = minmax/i„(u,w,CT) = maxmin/t„(u, w, cr), 

U<t V>t lL<t V>t 

where /t„(u,w,(T) solves 



jec{u,v) V ^ / 

over C(u, v) '■— {j '■ 1 < j < n; u < tj < v}. 

We need the following Additional Assumptions: 

A5 There exists fc > such that ip'{u) > for \u\ < k and tp'iu) = if |m| > k. 

A6 The estimator dn satisfies n}^^{dn — co) = op(l). 

Then we have the following Theorem: 

Theorem 2 Assume A1-A6 Then 

'T-^^^lAn(i,cro) - An(i,<7„)| = Op(l). 

Assume also that holds, then both estimators have the same asymptotic distribution. 

Remark 4 In the context of nonparametric regression Ghement, Ruiz and Zamar (2008) propose to use as 
scale estimator ct„ given by 

(Tn ^(.'^2 -^1 7 -^n '^n — 1 ) i 

where s is an M-estimator of scale, i.e., s(ui,...,u„) is defined as the value s satisfying 

-Exp)-^ (17) 

i—l 

where x(m) is a function which is even, non- decreasing for u > 0, bounded and continuous. The right hand 
side is generally taken so that if u is N(0,1), Ex{u) = b. This condition makes the estimator converging 
to the standard deviation ivhen applied to a random sample of the N( 0, 1 ) distribution. A popular family of 
functions x to compute scale M-estimators is the bisquare family given by 

I 1 if \u\ > c. 



fin{u,v,a) 
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Ghement et al. (2008) prove that if fi{t) is continuous under general conditions on x Condition A6 is 
satisfied with ctq defined by 

EgXc (^^) = b. (19) 



Remark 5 An alternative scale estimator, which does not require the continuity of fi, is provided by 

CTn = "^edianduil,..., |u„|) 

where are the residuals corresponding to the Liisotonic estimator. We conjecture but we do not 

have a proof that this estimator converges also with rate to ctq =medianG{\u\) /^~^ {3/ . 



5 Influence Function 

In order to obtain the influence function of the isotonic M-estimator at a given point t we need to assume 
that the pair {x, t) is random. In this case the isotonic regression model assumes that x = + u, where u 
is independent of t and /i(t) is non-decreasing. We assume that the error term u has a symmetric density 
and that the observation point t has a distribution with density h. 

We start assuming that (Tq is known and suppose that we want to estimate /x(io)- Given an arbitrary 
distribution A of the isotonic M-estimating functional of fi{to) which we henceforth denote by Ttg{A) 

is defined in three steps as follows. First for r, s > let m(tQ, r, s, A) be defined as the value m satisfying 

— oo 

Let 

m~ (to , r, A) = min m(tQ,r, s, A), 

s>0 

and then Ttg (A) is defined by 

Ttg{A) = max m" (to,?', A). 

r>0 

Let A„ be the empirical distribution of {(a:„j-, t„j), 1 < j < n}, then if /t„(i) is the estimator defined in Q, 
we have 

flnit)=Tt{An). 

It is immediate that if Aq is the joint distribution corresponding to model ([Ij we have TtglAo) — /x(to), 
so that the isotonic M-estimator is Fisher-consistent. Consider now the contaminated distribution 



Ae,t',x* = (1 - e)Ao + eS(^t',x'), 
where (5(f^2:») represents a point mass at (t*,x*). In this case we define the influence function of Ttg by 

IF* (T,, , r , X* ) = hm iTtoiKt'.')-TtMf (20) 

£->0 £ 

Then, we have the following Theorem: 
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Theorem 3 Consider the isotonic regression model given in and let Tt^ be an isotonic M-estimating 
functional, where to is an interior observation point. Then, under assumptions AI-A4 we have 

( 2^i'{to)ao\i;iix~fiito))/ao)\ 

IF^{Tt„t*,x*)^ { h{to)EGi^'{u/ao)) ^ °' (21) 

[ if t* ^ to. 

Notice that in the numerator of ([^0)1 appears the square of the bias instead of the plain bias as in the 
classical definition of Hampel (1974). Therefore for the isotonic M-estimator Ttg the bias caused by a point 
mass contamination (io,x*) is of order e^/^ instead of the usual order of e. 

Alternatively, it is also of interest to know what happens when we are estimating fi{to) and contamination 
takes place at a point t* ^ to- According to (|2ip . the influence function in this case is zero. This occurs 
because in this case for e sufficiently small Ttg{A^^t'x*) = Tto(^o)- 

It is easy to show that when we use a scale (j„ — >■ ctq defined by a continuous functional, the influence 
function of the isotonic M-estimator is still given by (HJ). 



6 Breakdown Point 

Roughly speaking the breakdown point of an estimating functional Ttg of /i(io) is the smallest fraction of 
outliers which suffices to drive \Ttg\ to infinity. More precisely, consider the contamination neighborhood 
VAo,e of the distribution Aq of size e defined as 

VAo,e = {A:A = (l-£)Ao+eA*}, 

where A* is an arbitrary distribution of (a;, t) such that t takes values in [0, 1] and x in K. The asymptotic 
breakdown point of Tj^ at Aq is defined by 

e*{Tt„,Ao)^mi\e: sup |Tt„(A)| = 00 I . 

We start considering the case that (Tq is known. Then we have the following theorem. 

Theorem 4 Consider the isotonic regression model given in (Qp and let Tt^ be an isotonic M-estimating 
functional where to is an interior observation point. Then under assumptions AI-A4 we have 

H{to) 1 - H{to) 



e*{Ttg, Ao) > min 
In the special case when H is uniform, this becomes 



1 + Hito)'' 2- H{t 



0) 



£*(T,„,Ao)>min^^,^^ (22) 



1 + to 2 - t 



which takes a maximum value of 1/3 at tg ~ 1/2. 



In the case that ctq is replaced by an estimator ct„ derived from a continuous functional S, it can be 
proved that the breakdown point of Tf„ satisfies 

A ^ ^ ^- / -^(*o) l--ff(io) s 
e (T,„,Ao)>mm|^^^,^-^,£ (Ao) 

Ghement et al. (2008) showed that if <?„ is defined as in Remark 01 where s is defined by P7)l -(|19 p with 
c = 0.7094 and 6 = 3/4, then e*(Ao) = 0.5 . Moreover in this case CTo coincides with the standard deviation 
when the error has a normal distribution. 
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Figure 1: Infant Mortality Data. The solid line corresponds to the classical isotonic regression and the 
dashed line to the isotonic M-estimate 



7 Examples 

Example 1 In this section we consider data on Infant Mortality across Countries. The dependent variable, 
the number of infant deaths per each thousand births is assumed decreasing in the country's per capita income. 
These data are part of the R package "faraway" and was used in Faraway ( 2004 )■ The manual of this package 
only mentions that the data are not recent but it does not give information on the year and source. In Figure 
[7] we compare the L2 isotonic regression estimator with the isotonic M-estimator computed with the Ruber's 
function with k = 0.98 and a„ as in Remark 2, where s is defined by ^17^ - (T ^) with c = 0.7094 and b = 3/4. 
There are four countries with mortality above 250: Saudi Arabia (650), Afghanistan (4OO), Libya (300) and 
Zambia (259). These countries, specially Saudi Arabia and Libya due to their higher relative income per 
capita, exert a large impact on the L2 estimator. The robust choice, on the other hand, appears to resistant 
to these outliers and provides a good fit. 

Example 2 We reconsider the Global Warming dataset first analyzed in the context of isotonic regres- 
sion by Wu, Woodroofe and Mentz ( 2001 ) from a classical perspective and subsequently analyzed from 
a Bayesian perspective in Alvarez and Dey (2009). The original data is provided by Jones et al. (see 
http//cdiac. esd.ornl.gov/trends/temp/jonescru/jones. html) containing annual temperature anomalies from 
1858 to 2009, expressed in degrees Celsius and are relative to the 1961-1990 mean. Even though the global 
warming data, being a time series, might be affected by serial correlation, e.g. Fomby and Vogelsang (2002), 
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Global Warming 
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Figure 2: World Annual Weather Anomalies 



we opted for simplicity as an illustration to ignore that aspect of the data and model it as a sequence ofi.i.d. 
observations. 

In Figure\^we plot the L2 isotonic estimator, which for these data is identical to the isotonic M-estimate 
with k=0.98. Visual inspection of the plot shows a moderate outlier corresponding to the year 1878 (shown as 
a solid circle). That apparent outlier, however, has no effect on the estimator due to the isotonic character 
of the regression. The fact that the L2 and the isotonic M-estimates coincide for these data seems to indicate 
that the phenomenon of Global Warming is not due to isolated outlying anomalies, but it is due instead to 
a steady increasing trend phenomenon. In our view, that validates from the point of view of robustness, the 
conclusions of other authors on the same data (e.g. Wu, Woodroofe and Mentz (2001), and Alvarez and Dey 
(2009)) who have rejected the hypothesis of constancy in series of the worlds annual temperatures in favor 
of an increasing trend. 
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8 Monte Carlo results 



Interestingly the limiting distribution of the Isotonic M-estiniator is based on the ratio 

EG(V''Wao))^ 
EG(V2(w/ao)) 

as in the the i.i.d. location problem (e.g. Maronna, Martin and Yohai, 2006). The slower convergence rate, 
however, entails that the respective asymptotic relative efficiencies are those of the location situation taken 
to the power 2/3. Specifically, note that from Theorem 1 for any isotonic M-estimator 

avar|ni/3[/i„(to)-M(^o)]} (23) 

2/3 

var[slogcm (W(w) + w^)], (24) 

where avar stands for asymptotic variance and var for variance. 

In order to determine the finite sample behavior of the isotonic M-estimators we have performed a Monte 
Carlo study. We took i.i.d. samples from the model ([IJ with trend term /i(i) = 10 + and where the 
distribution G is N(0,1) and Student with three degrees of freedom. The values {ti ~ i/{n + < i < n} 
corresponds to a uniform limiting distribution H{t) ~ t ioi < t < 1. 

We estimated /i(io) at to = 1/2, the true value of which is /i(io) = 11.25 using three isotonic estimators: 
the L2 isotonic estimator, the Li isotonic estimator and the same isotonic M-estimator that was used in the 
examples. We performed N — 500 replicates at two sample sizes, n — 100 and 500. Dykstra and Carolan 
(1998) have established that the variance of the random variable "slogcm (W(u) + v"^)" is approximately 
1.04. Using this value, we present in Table 1 sample mean square errors (MSE) times n"^^^ as well as the 
corresponding asymptotic variances. 



Estimator 


n = 


100 


n= 


=500 


avar 




Normal 


Students 


Normal 


Students 


Normal Students 


L2 


1.93 


3.78 


1.85 


3.65 


1.92 3.98 


Li 


2.38 


2.89 


2.67 


2.76 


2.59 2.89 


M 


2.04 


2.86 


2.11 


2.51 


2.06 2.53 



Table 1. Sample MSE and avar for Isotonic Regression Estimators. 



We note that for both distributions, the empirical MSEs for n = 500 are close to the avar values. 
We also see that under both distributions the M-estimator is more efficient that the Li one, that the M- 
estimator is more efficient than the Li one for both distributions and that the Li estimator is slightly less 
efficient than the L2 estimator for the normal case but much more efficient for the Student distribution. In 
summary, the isotonic M-estimate seems to have a good behavior under both distributions. 

9 Appendix 

9.1 Proof of Lemma 1 

Without loss of generality we can assume that (Tq = 1- Given c > 0, for sufficiently large n there exist 
positive numbers /3i(ri) and I3u{n) for which 

H{to) - Hito - Mn)) = H{to + /3„(n)) - H{to) = cn-^'^ . 



2^ ^'")[Eg^'(«)]2 



11 



As in Wright (1981), we first argue that 

P[A„(io) ^ a: (to)] < P(^^ln) + P(^^2„), 

where 

In = i niin/iT! (to - A ("-),"] < max /i„[u, to - A(")] L (25) 

lv>to u<to~ai{n) J 

f^2n = S max/i„[u,io + ;3„(n)) > min (in[ta + Pu{n),v]\ . (26) 

To see this, note that the complement of Vl2n is the set in which, for all u < to and all v > to + /3„(n) we 
have that {/i„[u,io + Puin)) < /trj^o + /3m("-)i^]}- Since is non-decreasing we can write 

to + /3u("-)) < u]. 

This in turn entails that in f2§„ 

Mn(to) = max min fin[u,v\. 

u<to to<v<to+au(n) 

Using the fact that the maximum and the minimum may be reversed in computing these estimators (e.g. 
Robertson and Waltman, 1968) and a similar argument for f2i„ in equation psp one can show that 

p{i7j„ni]^J<p{/i:(to) = An(to)}. 

So we need to prove that 

lim limsupP(r2i„) = lim limsupP(f22n) — 0. 

We will prove limc^.oo 1™ sup„_j.Q^ P(rii„) — 0. The result for fl2n can be obtained in a similar manner. 
Let 

Ai„ = <^ min/l„(to - I3i{n),v] < ^{to - (3i{n)) } , (27) 
A2„ = <^ max fin [u, to - A (?^)] > ^(^0 ~ l3i{n))\ . (28) 

I u<tQ-ai(n) J 



Since 

it will be enough to prove that 



P(f^i„) <P(Ai„) + P(A2„), 



lim limsupP(A„) = 0,i = 1,2. (29) 



Since the proofs of p9| for i = 1 and 2 are similar, p9| will be only proved for i = 1. By the fundamental 
identity ([9]) we have 

Am = I min Sn (to - A(ri), A<(io - A(n))) < 1 . (30) 

I v>ta I 
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In the sequel in order to simplify notation we will omit the subindex n writing 
Unj = Uj making it explicit only when there is a risk of confusion. We can write 

Sn{to- l3i{n),v,fi{to- ^i{n))) ^ ^ ip {xj - fi{ta ~ l3i{n)) 

jeG{to-pi{n),v) 

V'K + (/ife)-Mio-/3iN)), 

j6C(to-/3i(").f) 

and by a Taylor expansion we get 

jeC{tQ-Pi{n),v) 

where < a* < fJ,{tj) — /i(to — f3i{n). Put r — sup-0', then 

Sn{to-Mn),v,fi{to-l3i{nm< ^ V'KO+r ^ (^^(^t,) - ^i{to - I3i{n)). 

jeC{tQ-l3i{n),v) jfzC{to-l3i{n),v) 

Thus, since fj,{t) is increasing we get 
mm Sn {to - l3iin),v,n{to- I3i{n))) <min il){uj)+T {^{tj)- ^i{tQ- l3i{n)). (31) 

V>tQ V>tQ ^ ^ ^ ^ 

jGC(to-A («),!;) jeC(to-ft(«)>«o) 

Put ni{v) := #{j : to — I3i{n) < tnj < v}. As > ni{to), we obtain 

1 



?)>to ni{v) 



Sn{to-l3i{n),v,fi{to-f3i{n))) (32) 



<niin^ E ^i^j) + ^rT E (M(i,)-Mio-A(n)). (33) 
t)>to ni(v) ^-^ nAto) ^-^ 

Therefore the event Ai„ defined in ([50)1 is included in the event A„ defined by 

A„ = <( niax V --(/-(mj) > — ^ V {n{tj) - ^(^0 - Pi{n)) \ . 

I v>to ni{v) ^-^ niitn) ^-^ I 



The equation above can be rewritten in terms of integrals with respect to the empirical distribution of the 

i's as 

1 



max ■ 



to 



J2 -^{u,)>T \fi{s)~^i{to-l3i{n))\dHn{s). (34) 

3&C{to-pi{n),v) " '^^ > 

Since u;, . . . , u„ are i.i.d., relabelling the u^ 's on the left hand side we get that 

P(Ai„) < P(A:), (35) 
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where 



max 

ni(to')<k<n k 



ni(to)<j<k 

Adding and subtracting dH{s) wc can write 

rta 



to 



\^l{s) ^ fi{to - piin))\dHn{s) 



r [iji{s) - - Pi{n))]dHn{s) = r [ti[s)-ii{to-fii{n))]dH{s) 

Jto~Pi{n) Jtn-BiM 



to 



[li{s) - M(io - Pi{n))]d{Hn{s) - H{s)). 



Using for n large enough, the second term in the above equation is bounded by 



to 



[li{s) - n{to - Pi{n))]dH^{s) - H[s)) 



to-A(n) 



< 2 {ii{to) - n{to - I3i{n)) sup |i/„(t) - H{t)\ 

t 

<2/x'(io)/3iWn-i/3o(l), 



and since by the inverse function theorem (3i{n) = c[H'{to)] ^/■^[l + o(l)], we obtain that for 
constant A which does not depend on c we can write 



[^(s) - /i(io - l3i{n))]d{Hn{s) - H{s)) 



< Acn-^^^o{l). 



/to-A(n) 

Consider now the first term in the right hand side of Equation ([37|) . Using (jlip we have 
[n{s)-^i{to-l3i{n))]dH{s) 



(to - 01 (n), to] 



Hto) - ~ l3i{n))]dH{s) 



to 



to-/3i(«) 



to-Pl{n 

= [/i(io) - /i(io - MnMHito) - H{to - AW)] 
Kto) - M(io - I3i{n))\ fH{to) - H{to - I3i{n)) 



to 



< 



= ti'{to)[l + o{l)]H'{to)[l + o(l)]A(n)2 
^^i'ito)[H'ito)]-'c'n-^/'[l + oil)]. 



to-A(n) 



[fi{to) - n{s)]dH{i 



Therefore 



to-A(«) 



[fi{s) - fi{to - f3i{n))]dHn{s) 

< ti'ito)c'[H'{to)]-'n-^^^l + o(l)) + 2ti'{to)c[H'ito)]-'n-'/^o{l) 

< ^l'{toy[H'{to)]-^n-^/^ {n-^/^[l + o(l)] + 2n-^/^o{l)} 
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with c* = max(c, (?\ Then, for some constant B which does not depend on c we can write 

\ii{s) ~ fi{tQ - (3i{n))]dH„{s) < Bc*n-^l^. (39) 

to-ft(n) 



From ([30]), ([34]), ([35]), ([36]), (|37]), (|38]) and (|39]) we derive that there exists a constant D independent of c 
such that for n large enough and c > 1 



P(Ai„) < P < max i V -V'(w,) > Dc^vr^l^ \ 



(40) 



At this point, we use the Hajek-Renyi Maximal Inequality (e.g., Shorack, 2000) which asserts that for a 
sequence yi , . . . , ?/„ of independent random variables with mean and finite variances and for a positive 
non-decreasing real sequence {6^, k e A^}, 



P < max 

I 'm<k<n 



i=i Vj 



hk 

Using this inequality from (PO)) we get that 



J ik=l fc=m+l J 



l<j<fe J 

Approximating the Riemann sum we obtain 

y k-^<-^. (43) 

and since by ([TT]) ni{to) = c'n?/^{l + o(l)), for n large enough we have 

n/(<o)-' < 2c-in-2/3. (44) 
From (|42]) . (|43)) and (|44p we derive that for ?i large enough 

2Eg(^2(w)) 



P(Ai„) < 



< 



4Eg(V''(u)) 



Then the Lemma follows immediately. 
9.2 Proof of Theorem 1 

Without loss of generality we can assume that cto = 1. Since a/(n) = a«(n) — 2c[H'{to)]^^n^^^^[l + o(l)], 
and c is arbitrary, we will consider the localized estimator 

An(^o) = max min fin{u,v) 

= maxmin /i^(u, w), (45) 

U<tQ V>tQ 
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where /t5j(u, v) is defined as the root of 

S-„{u,v,fi)= (46) 

over D(u,v) = {j ■ ^ < j < n;to — cn~^/^ <u<tj<v<to + cn~^/^}. Note that the localized estimator 
depends on the tj's that lie on a neighborhood about to which shrinks at a rate n~^^^. To proceed with the 
development of the asymptotic distribution let now Wj = n}^^{tj —to),r = n^^^{u — to) and s = n^/^{v — to). 
With this notation, fi^iu, v) is a root of the partial sums in the parametrization 

S:^{r,s,f,)= H^j-l^) = 0, (47) 

where B{r, s) = {j : 1 < j < n;r < wj < s; r, s e [— c, c]}. So that the relabelling implies /t^(u, v) = /ijj(r, s). 
Consequently, 

A^(io) = maxmin/i^(r,s). 

r<0 v>0 

Now a Taylor expansion of lJ,{tj) around to for any j G B{r, s) gives 

fl{tj) = fl{to) + lJ.'{to){tj - to) + 0{\tj - tol) 
= ^i{to) + iJ.'{to)n-^I^Wi + Oj{n-^l^) 

which entails that 

Xj = ii{to) + iJ,'{to)n~'^/^Wj + Uj + Oj{n~^l^). 

Using the equivariance of M-estimators, the monotonicity of ij) and the fact that is bounded, it can be 
proved that 

An(?-, s) = n{to) + illir, s) + o,,(n-i/3)^ 

where /i^(r, s) solves 

5^(r,s,M)= Yl V'(n-'/V(*o)t«i+«j-M) = (48) 

jeB{r,s) 

and 



Thus, using that the Wj are bounded over —c<r<Wi<s<cwe have 



A^(*o) = /u(io) + max min /i^(r,s) +o(n ^/^). 

— c<r<0 0<.v<c 

This entails that 



n^^^ii^nito) - M(io)] = maxminn^/=^/i^(r, s) + o*,{l), 

r<0 v>0 



where 



< K2C\ 



Then, we only need to obtain the asymptotic distribution of 

^nc = n^/^maxmin/if;(r, s). 

r<0 s>0 
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Let fi'^{r, s) be the solution of 



j&B{r,s} 



Since \n -^/^^'(fo)wj | < n ^/^^'{tQ)c we have that 



where 

We will approximate now Urs as follows 



\d„rs\ < K3cn-^/\ 



and therefore 



and 



#{1 < j < n : r < < Wj < s} 

= -#{1 <.i<n:tn- rn^'^/^ < to < tj < to + sn'^/H 
n 

= [i/„(io + sn-'^^) - Hito + sn-'/^)] 

+ [H{to + _ ^(ip _ - [H^{to) - H{to)] 

= 77'(to)(s-r)n-i/3 + o(n-i/3) 

= n-i/3i/'(<o)(s-r)[l + o(l)], 

n,, =7i2/3H'(fo)(s-r)[l + o(l)], 



,1/2 



1 



(49) 
(50) 



(51) 

(52) 
(53) 

(54) 



nrs nli^H'{toy/^{s - r)i/2(i + o(l)) ' 

Then, taking n^s — >■ oo and applying the law of large numbers is easy to show that /if*(r, s) — )■ a-S- and 
therefore by (l49l) and (|50)) ^"(r, s) — too. Since p-^ir, s) satisfies (l48t . by a Taylor expansion of S'^(r, s) 
we get 

J2 ^K)- E V''(«,)(A^(^s)-n-i/V(io)«^,) 

B(r,s) B{r,s) 
B{r,s) 

= 0. 



From here we obtain 



^ E,gB(.,.) ^(%-) + A^Xto)^'^/^ ^.^'K) + n-^/^^i'itor 
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and then 

By (|48|) . the Law of the Large Numbers, \wj \ < c and i/j" bounded we have 



(55) 



jes(r,s) 
jes(r,s) 



and 



^ E V-'K)^ (s-r)i7'(io)EG(^'H) a.s.. 



j^2/3 

iG-B(r,s) 



Then, (155j) entails 

(s-r)EG(^'H)H%)nV3^^„(r,s) = ^ E + m'I^o)^ E "'W^'K) + o-W- (56) 



jeB{r,s} 3&Bir,s) 



Let 

M'(io) 



x; v-c^i) if s > 



nV3Eo(*'M)°/»/f(f„)'/S,|i,„, " ' * 

By (|53p and the Central Limit Theorem we have that for any set of finite numbers si, S2, Sr,—c < < c, 
the random vector {Bn.s-n ...yBn^Sr) converges in distribution to N(0, S) where S — (aij) with 

Si A Sj if Si > 0, Sj > 
-Si A —Sj if Si < 0, Sj < . 
ifs, >0,sj<0 

Moreover, using standard arguments, it can be proved that Bn{s) is tight. Then, we have 

Bnis) 4 Bis), (58) 

where _B is a two sided Brownian motion 

As for the second term in the right hand side of ([55)1 define 

^ E ip'{uj)wj if s > 



An(s) = <! 1 . (59) 

-JT^ E -V''(%)wj ifs<0 

' s<w,<0 
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For s > we can write 



1 " 

An(s) = - XI n^^^^'Mi^J - ^o)l(io < tj <to + 



n 
i=i 

and then 

E(A„(s)) = EG(V'(ti,)) / "'/'(t - to)dH,, 

J to 

Integrating by parts we get 

to J to 

and by ((TT]) we have 

n^'^s^H,,{to + = n'^^s^H{to + sn-^"") + o(l). 

We can write 

^io+sn~i''^ l-to+sn-^/^ ^.to+sn-^''^ 

/ n^'^Hn{t)dt:^ / n^l^H{t)dt+ / n^^^{Hn{t) ~ H{t))dt, 

J to ^ to J to 

and bv PTj) we get 



/ n^^^iH„it)-Hit))dt 

J to 



< sn 

t 

Therefore by (|62]), (IMl) and ([Ml) we get 

I r?/^{t - h)dHn = n^^^s^Hnih + sn^^/^) - / r?/^Hn{t)dt + o(l) 

•I to ^ to 

to+sn^'-^^ 

n^/^{t-to)dH + o{l) 

to 

n^/^{t-tn)H'{t)dt + o{l) 

to 

Pto+sn-^/^ 

= H'{to) / n^/^{t~tQ)dt + o{l) 

J to 

= i/'(to)y +0(1), 

and for ([6T|) we get that for s > 

E(A„(s)) = EG(V'K))i^'(io)y + o(l). 
Now we compute the variance of A„(s). From (|60|) we have 



1 " 

(s) = - ^n^/^V'KOfe - to)l (to < t.j < to + 



A 
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var(A„(s)) = ^ n'/^it, - tofm < t, <to + sn-''^) 



to 



< var(V''(u))ni/3s2„-2/3^^-i/3 

= var(i/''(u))s^n-2/3 
= o(l) 



Then by ([55)1 we obtain 
Similarly we can prove that 



A„(s) EG(V'K))M'(to)i^'(io)y for s > 0. 



A„(s) EG{i^'iu,))fi'{to)H'{t„)^ for s < 0. 



Therefore from §^ and §7]i we get that 

Now the rest of the proof is as in Wright (1981). 

9.3 Proof of Theorem 2 

We require the following Lemma 

Lemma 2 Assume A1-A5 Then, 

d 



< k, for all u < V. 



Proof 

Taking the first derivative of Equation ([TS]) with respect to a yields 



If f - i^n{u,v,a)\ ( 1 1 dfin{u,v,a)\ 
J ^ 1 



jec{u.v) 

and then 



5cr ' ' / Xj - fln{u,v,a) 
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Let D(u, v) — C{u, v) n {j : \xj — u, (T)\/cr < k}. Then by A5 we obtain 



da 



finiu,v,a) 



Therefore 



Proof of Theorem 2 

By the mean value theorem 



I (x-j - jin{u,v,a) 



< 



< k. 



da 



jln{u,v,a) 



I ( Xj - jln{u,v,a) 



< k. 



fin{u,v,an) = fin{u,v,ao) + ^/i„ (u, w, CT* ) (o-„ - a), 
where cr* is some intermediate point between a and a„. Hence, by Lemma 2 we have 
maxmin/i„(u, w, (T„) — fc|a„ — dol < maxmin/i„(u, w, cr) 

u<.t v>t u<t v>t 

< maxmin/i„(u, v, &„) + k\an — ao\ 



and A6 implies 



- Mt,ao)\ < fcn^/^|o-„ - (Tol Op(l). 



9.4 Proof of Theorem 3 

Without loss of generality we can assume that ao ~ 1. We consider first the case t* — to. Assume that 
X* < /i(io)- Then S(^to,x*) represents a contamination model where an outlier is placed at the observation 
point to with value x* which is below the trend /i(to) at the point. Let k = fc(e) be the value such that 



Tt„{Ae^to,x') = m (to, to - fc, Ao) = m{to,to - k,to,Ao). 



It is immediate that 



fi{to -k) = Tt„{A^) = ■m{to,to - A;,to, Ao). 
Then m{to — fc,to, Ao) should be the value of m satisfying 

rto 



stp{xo -m) + (1- e) 
and, since by (|69[) m — nito — k), we have 
£il}{xo - m(^o - k(£))) + (1 - e) 



ijj{fj,{t) + u — m)h{t)g{u) dt du — 0, 



to 



to — k{e) J —oo 



(69) 



(70) 



+ u- pL{to - k{e))h{t)g[u)dtdu = 0. (71) 
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Applying the Mean Value Theorem to the first term of (|7ip we can find < e* < e such that 



ipix* - ^(io - k{e))) 

= - M(to)) - ^'{x* - ^iito) - k{e*)^i'{to - k{e*))k{e). 
As for the second term in ([71]) we also have that 

oo 

/ 4'{fJ-{t) + u — fj,{to — k{e))h(t)g{u)dtdu 

ta~k(s) J 



(72) 



to-fc(e) 



ip{u)h{t)g{u)dtdu 



to-k{e) 



{fi{t) — fj,{tQ — k{e))g{u)h{t)^/ [u + j)dudt, 



where < 7 < n{to) — /^(^o — k{e)). Since '0 is odd and g even, 'ip{u)g{u)du = 0, so that the first term 
above vanishes. As for the second term, notice that 



to °° 

/ / {^i{t) - fi{tQ - k{e))g{u)h{t)i:'{u)dudt 

Jto-k(e) J 



to 



{p{t) - ^(to - k{e))h{t)dt 



+ ^)g{u)du 



(73) 



The first integral factor in the right hand side of the above display can be further approximated. By the 
Mean Value Theorem, there exists f (t) such that <o — k{e) < £_{t) < to and 

to pto 

{n{t) - ii{tQ - k{e))h{t)dt ^ I fi'{^{t)){t -to + k{e))h{t)dt 

to-fc(e) Jto-k{e} 

to 

pi'{to){t-to + k{e))h{t)dt 

to-k{e) 

^^^^^[it-t, + kie))X.,^^^ 



= ^^i'{to)h{to)k^e). 
From expressions ([7 ^ - ([7H) we obtain that Equation ([7T|) . can be written as 



(74) 



s [iPix* - fi{to)) ~ ^b'ix* - Ai(io) - kie*)fi'{to - k{e*))k{s)] + il-e)-^i'{to)h{to)k''{e) / ^'{u)g{u+j)du = 0. 
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Dividing both sides of this equation by e and using that fc(e) — > and 7 — > when e — > we obtain 

hn.^. Mf-^iiM . (75) 

h{to)^'{to) J ip'{u)g{u)du 

—00 

Finally, according to (15^ and using the Mean Value Theorem, we can write 

e-^O e e^O e 

e^O e ' 

where t*{e) Iq. Then using equation ([75)1 we obtain that 

IF {Ttg,to,x ) = lim 



e^O e 
__ 2^i'(to)'ip{xa - A^(^o)) 
/i(to)EG(V/(ti)) 

/i(to)EG(^'(^i)) 
The proof in the case the that x* < ^{to) is similar 

We consider now the case t* > to. To prove this part of the theorem is enough to show that there exists 
e* > 0, so that e < e* imphes 

rt„(Ae,t.,..) =rto(Ao) = M(io), 
and to prove this is enough to show that 

miiim(to, r, s, Ae^t' ,x') = m{to,r,0, A^^f ,x') = m(to,r,0, Aq). (76) 

When X* > ii{to), this is immediate. Consider the case that x* < /i(to) 
Clearly for < s < t* 

m(to,r,s,As^t*,x*) = m(to, s, Aq) (77) 
> m{to,r,0,Ao). 

It is also easy to show that s > t* implies 

m{to,r,s,Ae,t,x') > m{to,r,t* , A^^t* ,x') (78) 

and for r < and for all s 

m{to,r,s,As^t,x*) < m{to,0, s, A^^t.x-)- (79) 

Then, using ([77| - (|79)) and the fact that m(to,0,0,Ao) — m{to,0,0, , A^^t* ,x*), in order to prove (|76)) . it is 
enough to show that 

m(to,0,r,Ae,t.,,.) > m(to,0,0,Ao). (80) 
Recall that m{to,0,s,As^t*,x*) is the solution of 

eV- {x* - m) l{to <t* <to + s) + (1 - e)V{s, m) = 0, 
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where 

/oo i-to + s 
/ ij) {^{t) + u - m) dAo (i , u) . 
-oo to 

Clearly V{t* ,m{tQ,0,t* , Aq)) = and since m{r,t, Aq) and V{r, t,m) are both increasing in t we get 
V{r,t* ,m{Q,0, Aq) < O.Then, since ip is bounded, we can find e* , so that for e < e* we have 

{x* - m) l{to <t* <to + s) + {l~ £)V{s, to(0, 0, Aq)) < 0, 

and therefore m{0,t* , A^^t* ,x*) > m-(0, 0,Ao). Then ([50)1 holds and this proves the Theorem for the case 
t* > to. The proof for the case t* < to is similar. 



10 Proof of Theorem 4. 

Without loss of generality we can assume that o-q = 1 ■ It is easy to see that the least favorable contaminating 
distribution is A* concentrated at Stg^xo where xq tends to —oo or to oo. 
A necessary and sufficient condition for e < e* is that the equation 



£-ip{xo - m) + {1 - e) / ip{^i{t) + u ~ ■m)h{t)g{u)dtdu (81) 
Ja J -oo 

have a bounded solution m solution for all xq < fJ-{to) and that the equation 



eipixo - m) + {I - e) / ip{p,{t) + u - m)h{t)g{u)dtdu = Q (82) 

J to oo 

have a solution for all 2:0 > A^(^o)- 

Taking xq — > — c« we find that a sufficient condition for the existence of a bounded solution of (|5T|) for 
all xq < A'(io) is that 

-ek+{l-s)kH{to) > 0, 

and this is equivalent to 

Taking — >■ 00 we obtain that a sufficient condition for the existence of solution of (j82p for all Xq > fJ-{to) 
is that 

ek - il-e)kil- H{to)) < 0, 



^-H(to) , , 



and this equivalent to 
The theorem follows from ([5T|) and 
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